arXiv:1504.06597vl [quant-ph] 24 Apr 2015 


Characterizing errors on qubit operations via iterative randomized benchmarking 


Sarah Sheldon, Lev S. Bishop, Easwar Magesan, Stefan Filipp, Jerry M. Chow, and Jay M. Gambetta 
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 , USA 
(Dated: April 27, 2015) 

With improved gate calibrations reducing unitary errors, we achieve a benchmarked single-qubit 
gate fidelity of 99.95% with superconducting qubits in a circuit quantum electrodynamics system. 

We present a method for distinguishing between unitary and non-unitary errors in quantum gates 
by interleaving repetitions of a target gate within a randomized benchmarking sequence. The 
benchmarking fidelity decays quadratically with the number of interleaved gates for unitary errors 
but linearly for non-unitary, allowing us to separate systematic coherent errors from decoherent 
effects. With this protocol we show that the fidelity of the gates is not limited by unitary errors, 
but by another drive-activated source of decoherence such as amplitude fluctuations. 


Accurate characterization of control gates is an essen¬ 
tial task for developing any quantum computing device. 
Quantum process tomography (QPT) jlH3] has been the 
standard method for characterizing quantum gates be¬ 
cause, ideally, it produces a full reconstruction of the 
quantum process. In practice however, QPT suffers from 
many drawbacks, the most inimical being its exponential 
scaling in the number of quantum bits (qubits) compris¬ 
ing the system and that it is limited by state preparation 
and measurement (SPAM) errors. Various methods such 
as randomized benchmarking (RB) [4HZ] and gate set to¬ 
mography (GST) 0 0 have recently been developed to 
help overcome these limitations. RB is both insensitive to 
SPAM errors and efficient [10]. However, it only extracts 
a single piece of information, the average gate fidelity. 
GST on the other hand helps to overcome limitations 
from SPAM errors by reconstructing an entire library 
of gates in a self-consistent manner. The price paid for 
this self-consistent reconstruction is an even worse scaling 
than QPT. 

As control calibration techniques continue to improve 
and quantum gates approach the fidelity required for 
fault tolerant quantum computation, it becomes both 
important and difficult to verify the presence of increas¬ 
ingly small errors. Error verification constitutes a critical 
first step in a debugging routine since different physical 
mechanisms can lead to different error types. QPT and 
GST are often poor choices for error verification since 
they are time consuming and contain so much informa¬ 
tion that backing out the presence of specific error types 
on small scales can be a challenge in itself. In addition, 
SPAM errors in QPT sets a lower limit on the detectable 
error strengths 0. At the other end of the spectrum, 
while standard RB is efficient the information it contains 
about the gate is typically not enough to perform any 
sort of useful error verification. An extension of stan¬ 
dard RB, interleaved randomized benchmarking, consists 
of interleaving a target gate in a benchmarking sequence 
and provides bounds on the error for the gate of inter¬ 
est mm- Interleaved benchmarking can identify gates 
that are poorly calibrated, but does not reveal if the er¬ 
rors are due to decoherence, over-/underrotations, or off- 
resonance effects amongst other error types. Thus, fast 
and reliable routines that determine the presence of spe¬ 


cific error types are required. Others have proposed to 
use RB for measuring the unital part of a quantum map 
Il3j . correlated errors on a multi-qubit space (14) , and 
recently Ref. m has proposed an alternative method 
for measuring unitary errors. In this paper we propose 
and experimentally implement a protocol, largely based 
on the ideas of RB, that verifies the presence of unitary 
versus non-unitary errors. 

A major source of unitary errors in transmon qubits 
originates from the presence of higher levels, which can 
be removed by the derivative removal via adiabatic gate 
(DRAG) protocol [16]. To quantify this error source, 
we compare experimental randomized benchmarking fi¬ 
delities for several gate times with two simulations, one 
assuming a DRAG-corrected pulse shape and the other 
without DRAG (Fig. [l]). The measurements described 
here are performed on a two-qubit sample consisting of 
two transmon qubits coupled by a coplanar waveguide 
resonator, with independent readout resonators for each 
qubit. The qubit of interest has a transition frequency of 
5.0154 GHz and anharmonicity of —323 MHz. T\ and T 2 
are 45±6 ps and 53±10ps, respectively. These character¬ 
istic times are the mean values from 500 measurements 
taken over 14 hours, and the error bars are the standard 
deviation of this data; each independent experiment is 
well fit by an exponential decay. The pulses used in the 
RB sequence are truncated Gaussian pulses having total 
length equal to four times the standard deviatiation of 
the Gaussian and with the DRAG correction applied to 
the quadrature component. 

A typical benchmarking sequence consists of a set of 
random Clifford gates that together compose to an iden¬ 
tity operation 0. Under realistic assumptions on the 
noise, the fidelity between the implementation of this se¬ 
quence with the identity operation decays exponentially 
as a function of the number of Clifford gates m- When 
the fidelity decay is averaged over many realizations of 
the random sequence, the decay constant serves as the 
single metric for the average noise in the system. 

The weak anharmonicity, <5, of the transmon limits the 
gate fidelity as 1/5, which can be seen for short gate times 
in Fig. [I] The experimental data falls below the non- 
DRAG curve (brown dotted line in Fig. [lj, showing that 
we have partially removed unitary errors due to presence 
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FIG. 1. (color online)(a) Randomized benchmarking fidelity 
as a function of gate length. Simulated fidelity with a DRAG 
correction in solid blue and without in dotted brown. Ex¬ 
perimental data (points), with the highest fidelity of 0.9995 
occuring at 16.7 ns. Dashed black line: simulated fidelity 
when all gates are overrotated by 7r/64 (which would be de¬ 
tectable by IRB). Green dot-dashed line: simulated fidelity 
with gate-dependent dephasing proportional to the drive am¬ 
plitude = kQ. (b) The iterative benchmarking sequence 
with target gate C repeated n times between random Clifford 
gates, Ci. The case n = 0 corresponds to a regular random¬ 
ized benchmarking sequence as used for the data in (a). 


of higher levels in the transmon. At the gate length t g = 
16.7 ns, the error rate corresponds to an average fidelity 
per gate of 99.95% but is not yet limited by T) and T 2 
with the DRAG correction (blue solid line). With the 
current of control, we can calibrate pulses to within a 
factor of four of the limit set by T) and T 2 , but it is clear 
that there are still errors remaining in the system. (The 
remaining simulations in Fig. [I] will be described later in 
this text). 

For longer pulses the fidelity is limited by the finite co¬ 
herence time of the qubit. The tradeoff between decoher¬ 
ence and unitary errors shown in Fig. [l] is generic across 
quantum computing hardware. For optimal fidelity, any 
quantum processor will be operating with fidelity at least 
partially limited by unitary errors: if this were not the 
case, then the fidelity could surely be improved by short¬ 
ening the gate time. 

We extend interleaved randomized benchmarking by 
repeating a target Clifford n times between the random 
Clifford gates and measuring the fidelity as a function 
of n repetitions [Fig. 0b)]. If the gate errors are non¬ 
unitary, then the fidelity will only depend on the total 


length of the interleaved segment, and the resulting error 
per segment will thus be linear with n. If there are uni¬ 
tary errors of an over-/underrotation type, they will add 
coherently with n, and the fidelity decay will be quadratic 
to leading order. To see this, suppose we have a single¬ 
qubit unitary error of the form 

U = exp ■ aj , (1) 

where e, f, and a are the error angle, axis of rotation, and 
vector of Pauli operators respectively. Assuming e <C 1 
we can write U n to second order in e as 

U n = t — in-r ■ a — (n(2n — 1)) — (r • a ) 2 + O (e 3 ) . 

( 2 ) 


The average fidelity F of the error gate compared to the 
identity is given by F = Gtr (U n )| 2 + 2^ /6 and writing 
F in terms of the benchmarking parameter a = 2 F — 1 
gives [6] 


a = 1 — 




(3) 


which shows the quadratic dependence in n. A similar 
analysis finds that errors due to a T) or T 2 process do 
decay linearly in n. 

We use single sideband (SSB) modulation of our con¬ 
trol pulses and calibrate the in-phase/quadrature (IQ) 
mixers (MITEQ IRM0408LC2Q) for the chosen inter¬ 
modulation frequency (IF) to ensure only the correct 
sideband was produced with minimal leakage at the car¬ 
rier frequency. We then calibrate the in-phase control 
pulse amplitude and the amplitude of the quadrature 
component for the DRAG correction. The pulse ampli¬ 
tudes for a 7r-pulse {X n ) and a 7r/2-pulse (W/ 2 ) about 
the x-axis are tuned up by repeating the pulses in the 
sequence X n / 2 — {Xr n n / 2 \) 2n in order to amplify the er¬ 
rors. The evolution of the qubit’s Bloch vector during 
the first three points of this sequence is depicted in Fig. 
0a). 

We correct for over- or under-rotations by fitting to the 
measured population of the qubit ground state, P(|0)) 
[see Fig. 0b)]. Under the assumption that the error is 
only an over- or underrotation, it is simple to derive a 
fitting formula for the amplitude calibration sequences. 
The fit function for the X v / 2 pulse in this sequence is 


P(|0)) = a + ( -(—1)" cos(7t/2 + 2 ne) 


(4) 


where a is left as a fit parameter and goes to 1/2 for 
perfect X w / 2 pulses. For X„ the fit function is 

P(|0)) = a + cos(7t/2 + 2ne) J . (5) 


The angle error, e, found by this fit corresponds to a 
gate error r « e 2 /6. After fitting the error, we update 
the pulse amplitude accordingly. 
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TABLE I. AIC values for gates with no overrotation, 7r/256 
overrotation, and 7 t/ 128 overrotation for linear and quadratic 
model functions. 



FIG. 2. Calibrations of the control pulses: (a) Bloch sphere 
depiction of the qubit for the first three points of the error 
amplification sequence given in Eq . (b) The amplitude cal¬ 
ibration for a AA /2 pulse. The initial guess for the pulse 
amplitude has some error, which the sequence amplifies so 
the deviation from 1/2 grows with n, the number of repeated 
pulses, (c) The calibration of the DRAG parameter performs 
the AA /2 — X _ w /2 sequence while varying A, the amplitude of 
the derivative pulse on the quadrature channel. The correct 
derivative amplitude corresponds to the point where the qubit 
returns to the ground state. 


Lastly, we determine the DRAG correction by applying 
the sequence (X 7T / 2 —X_ n / 2 ) while varying the amplitude 
of the derivative pulse on the quadrature channel [Fig. 
[2jc)] . The final state of the qubit traces a cosine as a 
function of this DRAG amplitude, and we select the value 
that returns the qubit in the ground state, |0). 

The calibrated pulses are used for iterative randomized 
benchmarking (IRB), in which we interleave each target 
sequence zero to 16 times within random sequences of 
up to 365 Clifford gates [as depicted in Fig. [ljb)]. We 
average over 35 instances of each sequence and fit the 
decay to A n a l n + B n , where i is the number of Clifford 
gates, and n is the number of interleaved gates. Error 
bars are equal to the 95% confidence interval of this fit. 

We performed this protocol with a 16.7 ns gate 
time [the time producing the minimum error per gate, 
Fig. Fida)] and interleave the targets /, X n , and W/ 2 - 
For tnese three gates, the decay in a versus the number 
interleaved gates is linear [Fig. [3ja)] . This is consistent 
with the RB data that suggests the unitary errors at this 
gate time are small. 

We then intentionally add overrotation errors to the 
X v gate to determine a bound on the sensitivity of this 
procedure to amplitude errors. We repeat the iterative 
benchmarking procedure with the AA /2 pulse replaced 
with X^/ 2 +e, where e = {7r/64, 7 t/128, 7t/ 256}. The 
7 t/ 64 and 7r/128 overrotations lead to fidelities that fall off 
quadratically and are clearly distinguishable from gates 


approaching the coherence limit. The 7 t/ 256 appears to 
have similar errors to the calibrated gates, giving a bound 
on the sensitivity to overrotation errors. Note that with 
infinite Tf we could increase the sensitivity of this scheme 
by repeating a larger number of interleaved gates. 

In order to quantify the amount of unitary versus non¬ 
unitary errors in the iterative randomized benchmarking 
data, we fit the data to both quadratic and linear models. 
Using the Akaike information criterion (AIC), we deter¬ 
mine which model most accurately describes the data 
mm- The AIC is a useful tool for model selection and 
has been applied to quantum information previously m- 
For n data points and k fitting parameters, the AIC is 
given by 

C = nln(~) +2k+ 2fc ( fc + 1 ) ( 6 ) 

\n J n — k — 1 


where R is the residual sum of squares for the fit. The 
final term in this expression is a correction under the 
condition that n < 40fc. This correction increases the 
penalty for overfitting when the sample size is small. We 
compute the C for three models: linear, quadratic with 
no linear component, and combined linear and quadratic 
(see Table |T]). The relative probability that the ith model 
is correct is 


Pi = exp 




(7) 


with C m in the smallest AIC value for the set of mod¬ 
els. The model with the best fit to the data will have 
Pi = 1. We calculate the relative probabilities for the 
three models for iterative randomized benchmarking data 
with X ^/2 pulses with no overrotation, 7 t/ 128 and 7r/256 
overrotations. As detailed in Table |TJ the calibrated gate 
with no added error is best fit by a linear model, as ex¬ 
pected when there is little unitary error present. The 
gate with 7r/256 overrotation is fit best by the combined 
model. The preferred model according to the AIC for 
the gate with 7 t/ 128 error is the quadratic model, but 
this is in part due to the penalty placed on adding extra 
parameters to the fit function. 

From this analysis it follows that a 7 t/ 128 overrotation 
is detectable with this method and that consequently co¬ 
herent rotation errors must be smaller than this value. 
We therefore simulate RB in the presence of a system¬ 
atic 7 t/ 64 overrotation (easily detectable by IRB were 
it present), demonstrating that this is not sufficient to 
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FIG. 3. (color online) Iterative benchmarking data for (a) a 
16.7 ns gate and (b) a 10.0 ns gate. The interleaved gates are 
the identity (blue squares), X^/i (red circles), Y n / 2 (magenta 
diamonds), and X n / 2 Y^/ 2 (black stars). The product of a for 
X w /2 and Y)r /2 is shown (dashed black stars) for comparison 
to the X 7r / 2 Y 7 r/ 2 gate. Also in (a) are interleaved overrotations 
on an X n / 2 by 7r/256 (aqua triangles), 7 t/ 128 (dotted green 
triangles), and 7 t/ 64 (dashed orange triangles). The error bars 
here are the 95% confidence interval of the fit to the IRB data 
averaged over 35 instances. 


explain the deviation of the experiment from the sim¬ 
ulated RB [dashed black in Fig. [lja)]. We conclude 
that there is an additional source of decoherence that is 
present under the continuous-driving conditions of an RB 
experiment. One possible form for such non-unitary er¬ 
ror, would be a dephasing proportional to the Rabi rate 
of the drive, as would result from amplitude fluctuations 
in the local oscillator, an amplifier, or other microwave 
electronics along the control line. Simulated RB in pres¬ 
ence of such noise (green dot-dashed) shows reasonable 
agreement with the experimental data. Drive noise with 
a 1// dependence has been measured in flux qubits }20] , 
and such low freqeuncy noise has been studied in the 
context of randomized benchmarking mm- 

We notice that there is still a deviation from the best 
fit at the shortest gate time in Fig. [ija). To understand 
the origin of this larger error rate we calibrate gates of 
length 10 ns and apply IRB. For interleaved /, X n / 2 , and 


Y w /2 the iterative benchmarking data appears to decay 
linearly [Fig. [3jb)] . First, we notice that the error of a 
Y w / 2 gate is larger than the X^/2 gate error. We at¬ 
tribute this to our calibration procedure, in which the 
amplitude of the Y n / 2 is assumed to be equal to the X n / 2 
pulse amplitude, but sampling errors in the pulse gen¬ 
eration are not taken into account. Second, when the 
interleaved sequence is X^/ 2 Y^/ 2 (black stars) a larger 
decay is observed. This cannot be accounted for by mul¬ 
tiplying (dashed black stars) the individual benchmark¬ 
ing parameters, a, for the X n / 2 (red circles) and Y v / 2 
(magenta diamonds) implying an additional error on the 
Wr/aW /2 gate. (Note that, in contrast, no additional er¬ 
ror for the X n / 2 Y 7r / 2 sequence is observed for the 16.67 
ns gate, for which the product of X n / 2 and Y v i 2 matches 
the error for X n / 2 Y^/ 2 .) The X^/ 2 Y^/ 2 is not directly 
calibrated, and the presence of unitary errors here indi¬ 
cates a phase error, despite the fact that SSB modulation 
ensures the orthogonality of X and Y pulses by imposing 
a 7r/2 phase shift on the IF signal. 

After identifying the phase error, we have developed 
an error amplification sequence similar to those of Fig. [2] 
in order to quantify an X-Y axes error. The sequence is 
a repetition of X n Y n within a Ramsey experiment: 

X n/2 - (X v - Y n ) n - Y _ 7r / 2 . 

The fit function for the error case when X and Y are not 
orthogonal is the same function as for a 7r/2 amplitude 
error given in Eq. [2] The gate error measured by this 
sequence is 2e 2 /3. 

We measure this error as a function of the buffer time 
between pulses for three different pulse lengths, as shown 
in Fig. [4j The IRB data was taken with a 3.33 ns buffer 
indicated by the vertical line [with pulse length of 13.33 
ns for the data in Fig. |3][a) and 6.67ns for Fig. |3][b)]. 
The gate error is 2 x 10 -5 for the pulse length corre¬ 
sponding to the 16.67 ns gate, and 3 x 10 -3 for the 10 ns 
gate. This is consistent with the IRB data that demon¬ 
strates an axis error is present for the 6.67 ns pulse (red 
squares in Fig. |4j) but is not detected for 13.33 ns (violet 
triangles). The gate error decreases as the buffer time 
is increased until it levels off by around 15 ns, at which 
point the resolution of the fit is not better than 1 x 10 -5 . 
Because the error decreases with longer buffer time, it is 
likely due to distortions that cause successive pulses to 
overlap when the time between them is insufficient. Note 
that this effect is not typically considered in RB, in which 
it is assumed a pulse knows no history of previous pulses 
in the sequence. This pulse distortion may be alleviated 
by further pulse shaping (as shown in ‘231 with pulse dis¬ 
tortions on flux qubits) and will be the subject of future 
investigations. 

We have introduced a variation of randomized bench¬ 
marking, useful for distinguishing non-unitary from uni¬ 
tary errors, and have validated this method on a super¬ 
conducting qubit experiment. IRB will work for most 
physical unitaries without knowledge of the type of error 
present. Once a unitary error is discovered, one can de- 
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velop a calibration sequence to reduce the error. By push¬ 
ing gate lengths down and paying careful attention to cal¬ 
ibrating the resulting unitary errors, we have achieved a 
benchmarked single-qubit gate fidelity of 99.95%. The er¬ 
ror rate corresponding to this fidelity still deviates from 
the expected coherence by about a factor of four, but 
our iterative randomized benchmarking data indicates 
that we are not limited by unitary errors at this point. 
We now seek to identify sources of drive-activated non¬ 
unitary errors (beyond T\ and T 2 ) that must be limiting 
our fidelity at this time. 


FIG. 4. The gate error measured as a fit to the error amplifi¬ 
cation sequence X w /2 — ( X n — Y n ) n — Y v / 2 . The gate error is 
plotted versus buffer length for three pulse lengths: 6.67 ns in 
red squares, 10 ns in blue circles, and 13.33 ns in violet trian¬ 
gles. The buffer length used for the data taken in Fig. [3] was 
the shortest one shown here, 3.33 ns (indicated by the solid 
vertical line). 
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